250 research outputs found
The Emergence of Norms via Contextual Agreements in Open Societies
This paper explores the emergence of norms in agents' societies when agents
play multiple -even incompatible- roles in their social contexts
simultaneously, and have limited interaction ranges. Specifically, this article
proposes two reinforcement learning methods for agents to compute agreements on
strategies for using common resources to perform joint tasks. The computation
of norms by considering agents' playing multiple roles in their social contexts
has not been studied before. To make the problem even more realistic for open
societies, we do not assume that agents share knowledge on their common
resources. So, they have to compute semantic agreements towards performing
their joint actions. %The paper reports on an empirical study of whether and
how efficiently societies of agents converge to norms, exploring the proposed
social learning processes w.r.t. different society sizes, and the ways agents
are connected. The results reported are very encouraging, regarding the speed
of the learning process as well as the convergence rate, even in quite complex
settings
Probabilistic Timed Automata with Clock-Dependent Probabilities
Probabilistic timed automata are classical timed automata extended with
discrete probability distributions over edges. We introduce clock-dependent
probabilistic timed automata, a variant of probabilistic timed automata in
which transition probabilities can depend linearly on clock values.
Clock-dependent probabilistic timed automata allow the modelling of a
continuous relationship between time passage and the likelihood of system
events. We show that the problem of deciding whether the maximum probability of
reaching a certain location is above a threshold is undecidable for
clock-dependent probabilistic timed automata. On the other hand, we show that
the maximum and minimum probability of reaching a certain location in
clock-dependent probabilistic timed automata can be approximated using a
region-graph-based approach.Comment: Full version of a paper published at RP 201
The Impatient May Use Limited Optimism to Minimize Regret
Discounted-sum games provide a formal model for the study of reinforcement
learning, where the agent is enticed to get rewards early since later rewards
are discounted. When the agent interacts with the environment, she may regret
her actions, realizing that a previous choice was suboptimal given the behavior
of the environment. The main contribution of this paper is a PSPACE algorithm
for computing the minimum possible regret of a given game. To this end, several
results of independent interest are shown. (1) We identify a class of
regret-minimizing and admissible strategies that first assume that the
environment is collaborating, then assume it is adversarial---the precise
timing of the switch is key here. (2) Disregarding the computational cost of
numerical analysis, we provide an NP algorithm that checks that the regret
entailed by a given time-switching strategy exceeds a given value. (3) We show
that determining whether a strategy minimizes regret is decidable in PSPACE
The Complexity of Graph-Based Reductions for Reachability in Markov Decision Processes
We study the never-worse relation (NWR) for Markov decision processes with an
infinite-horizon reachability objective. A state q is never worse than a state
p if the maximal probability of reaching the target set of states from p is at
most the same value from q, regard- less of the probabilities labelling the
transitions. Extremal-probability states, end components, and essential states
are all special cases of the equivalence relation induced by the NWR. Using the
NWR, states in the same equivalence class can be collapsed. Then, actions
leading to sub- optimal states can be removed. We show the natural decision
problem associated to computing the NWR is coNP-complete. Finally, we ex- tend
a previously known incomplete polynomial-time iterative algorithm to
under-approximate the NWR
Approximating Euclidean by Imprecise Markov Decision Processes
Euclidean Markov decision processes are a powerful tool for modeling control
problems under uncertainty over continuous domains. Finite state imprecise,
Markov decision processes can be used to approximate the behavior of these
infinite models. In this paper we address two questions: first, we investigate
what kind of approximation guarantees are obtained when the Euclidean process
is approximated by finite state approximations induced by increasingly fine
partitions of the continuous state space. We show that for cost functions over
finite time horizons the approximations become arbitrarily precise. Second, we
use imprecise Markov decision process approximations as a tool to analyse and
validate cost functions and strategies obtained by reinforcement learning. We
find that, on the one hand, our new theoretical results validate basic design
choices of a previously proposed reinforcement learning approach. On the other
hand, the imprecise Markov decision process approximations reveal some
inaccuracies in the learned cost functions
Managing inventory and production capacity in start-up firms
We consider the problem of managing inventory and production capacity in a start-up manufacturing firm with the objective of maximising the probability of the firm surviving as well as the more common objective of maximising profit. Using Markov decision process models, we characterise and compare the form of optimal policies under the two objectives. This analysis shows the importance of coordination in the management of inventory and production capacity. The analysis also reveals that a start-up firm seeking to maximise its chance of survival will often choose to keep production capacity significantly below the profit-maximising level for a considerable time. This insight helps us to explain the seemingly cautious policies adopted by a real start-up manufacturing firm
Mean-Payoff Optimization in Continuous-Time Markov Chains with Parametric Alarms
Continuous-time Markov chains with alarms (ACTMCs) allow for alarm events
that can be non-exponentially distributed. Within parametric ACTMCs, the
parameters of alarm-event distributions are not given explicitly and can be
subject of parameter synthesis. An algorithm solving the -optimal
parameter synthesis problem for parametric ACTMCs with long-run average
optimization objectives is presented. Our approach is based on reduction of the
problem to finding long-run average optimal strategies in semi-Markov decision
processes (semi-MDPs) and sufficient discretization of parameter (i.e., action)
space. Since the set of actions in the discretized semi-MDP can be very large,
a straightforward approach based on explicit action-space construction fails to
solve even simple instances of the problem. The presented algorithm uses an
enhanced policy iteration on symbolic representations of the action space. The
soundness of the algorithm is established for parametric ACTMCs with
alarm-event distributions satisfying four mild assumptions that are shown to
hold for uniform, Dirac and Weibull distributions in particular, but are
satisfied for many other distributions as well. An experimental implementation
shows that the symbolic technique substantially improves the efficiency of the
synthesis algorithm and allows to solve instances of realistic size.Comment: This article is a full version of a paper accepted to the Conference
on Quantitative Evaluation of SysTems (QEST) 201
Maximizing the Conditional Expected Reward for Reaching the Goal
The paper addresses the problem of computing maximal conditional expected
accumulated rewards until reaching a target state (briefly called maximal
conditional expectations) in finite-state Markov decision processes where the
condition is given as a reachability constraint. Conditional expectations of
this type can, e.g., stand for the maximal expected termination time of
probabilistic programs with non-determinism, under the condition that the
program eventually terminates, or for the worst-case expected penalty to be
paid, assuming that at least three deadlines are missed. The main results of
the paper are (i) a polynomial-time algorithm to check the finiteness of
maximal conditional expectations, (ii) PSPACE-completeness for the threshold
problem in acyclic Markov decision processes where the task is to check whether
the maximal conditional expectation exceeds a given threshold, (iii) a
pseudo-polynomial-time algorithm for the threshold problem in the general
(cyclic) case, and (iv) an exponential-time algorithm for computing the maximal
conditional expectation and an optimal scheduler.Comment: 103 pages, extended version with appendices of a paper accepted at
TACAS 201
Value Iteration for Long-run Average Reward in Markov Decision Processes
Markov decision processes (MDPs) are standard models for probabilistic
systems with non-deterministic behaviours. Long-run average rewards provide a
mathematically elegant formalism for expressing long term performance. Value
iteration (VI) is one of the simplest and most efficient algorithmic approaches
to MDPs with other properties, such as reachability objectives. Unfortunately,
a naive extension of VI does not work for MDPs with long-run average rewards,
as there is no known stopping criterion. In this work our contributions are
threefold. (1) We refute a conjecture related to stopping criteria for MDPs
with long-run average rewards. (2) We present two practical algorithms for MDPs
with long-run average rewards based on VI. First, we show that a combination of
applying VI locally for each maximal end-component (MEC) and VI for
reachability objectives can provide approximation guarantees. Second, extending
the above approach with a simulation-guided on-demand variant of VI, we present
an anytime algorithm that is able to deal with very large models. (3) Finally,
we present experimental results showing that our methods significantly
outperform the standard approaches on several benchmarks
Optimizing Performance of Continuous-Time Stochastic Systems using Timeout Synthesis
We consider parametric version of fixed-delay continuous-time Markov chains
(or equivalently deterministic and stochastic Petri nets, DSPN) where
fixed-delay transitions are specified by parameters, rather than concrete
values. Our goal is to synthesize values of these parameters that, for a given
cost function, minimise expected total cost incurred before reaching a given
set of target states. We show that under mild assumptions, optimal values of
parameters can be effectively approximated using translation to a Markov
decision process (MDP) whose actions correspond to discretized values of these
parameters
- …